Reliable Cluster Computing with a New Checkpointing RAID-x Architecture
نویسندگان
چکیده
In a serverless cluster of PCs or workstations, the cluster must allow remote file accesses or parallel I/O directly performed over disks distributed to all client nodes. We introduce a new distributed disk array, called the RAID-x, for use in serverless clusters. The RAID-x architecture is based on an orthogonal striping and mirroring (OSM) scheme, which exploits full-bandwidth and protects the system from all single disk failures. The performance of the RAID-x is experimentally proven superior to RAID-1 and NFS in the Linux cluster environment. We propose a new striped checkpointing scheme, leveraging on striped parallelism and pipelined writing of successive disk stripes. This RAID-x architecture greatly enhances the throughput, reliability, and availability of scalable clusters. It appeals especially to I/O-centric cluster applications.
منابع مشابه
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing
A new RAID-x (redundant array of inexpensive disks at level x) architecture is presented for distributed I/O processing on a serverless cluster of computers. The RAID-x architecture is based on a new concept of orthogonal striping and mirroring (OSM) across all distributed disks in the cluster. The primary advantages of this OSM approach lie in: (1) a significant improvement in parallel I/O ban...
متن کاملDistributed Software RAID Architectures for Parallel I/O in Serverless Clusters*
In a serverless cluster of computers, all local disks can be integrated as a distributed software RAID (ds-RAID) with a single I/O space. This paper presents the architecture and performance of a new RAID-x for building ds-RAID. Through experimentation, we evaluate the RAID-x along with RAID-5, chained-declustering, and RAID-10 architectures, all embedded in a Linux cluster environment. All fou...
متن کاملDesigning SSI clusters with hierarchical checkpointing and single I/O space
(SSI) in a workstation cluster. In a cluster of computers, local area networks or highbandwidth switch networks using optical fibers physically connect a collection of node computers. The workstations in a cluster can work collectively as an integrated computing resource—that is, an SSI—or they can operate as individual computers, separately. Present clusters are usually small and provide only ...
متن کاملOrthogonal Striping and Mirroring in Distributed RAID for I/O-Centric Cluster Computing
-This paper presents a new distributed disk-array architecture for achieving high I/O performance in scalable cluster computing. In a serverless cluster of computers, all distributed local disks can be integrated as a distributed-software redundant array of independent disks (ds-RAID) with a single I/O space. We report the new RAID-x design and its benchmark performance results. The advantage o...
متن کاملAn Enhanced MSS-based checkpointing Scheme for Mobile Computing Environment
Mobile computing systems are made up of different components among which Mobile Support Stations (MSSs) play a key role. This paper proposes an efficient MSS-based non-blocking coordinated checkpointing scheme for mobile computing environment. In the scheme suggested nearly all aspects of checkpointing and their related overheads are forwarded to the MSSs and as a result the workload of Mobile ...
متن کامل